GitHub Repository: debakarr/machinelearning
Path: blob/master/Part 2 - Regression/Polynomial Regression/[Python] Polynomial Regression.ipynb
¹³³⁶ views

Kernel: Python 3

Polynomial Regression

In [2]:

from IPython.display import Image

In [3]:

Image('img/01.png')

Out[3]:

In [5]:

Image('img/02.png')

Out[5]:

Data Preprocessing

In [13]:

# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
%matplotlib inline
plt.rcParams['figure.figsize'] = [14, 8]

# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values

In [94]:

dataset

Out[94]:

Problem Satement: So we are human resource team working for a big company and we are about to hire a new employee in this company so this new entry seems to be great a good fit for the job and we are about to make an offer to this potential new employee and now it's time to negotiate negotiate on what is going to be the future salary of this new employees in the company.

And so at the beginning of the negotiation to simpler is telling that he's had twenty plus years of experience and eventually earned 160K. annual salary in its previous company so this employee is asking for at least more than a 160K.

However there is someone in the H. R. team that is kind of a control freak and always fantasized about being a detective so suddenly decides to call the previous employer to check that info you know the info about the previous a 160K annual salary of this future potential new employee but unfortunately all the info that this person manages to get are these info here that is the symbol table of salaries for ten different positions in the previous company.

So there's a term member of the team runs a simple analysis on excel or Google sheets and actually observed that there is a non linear relationship between these position of old and their associated salaries.

However this HR person could get another very relevant info this all the relevant info is that this new employee has been a region manager for two years now and usually it takes on average four years to jump from being a region manager to a partner.

So this simply was kind of half way between level 6 and level 7 and therefore we can say he was level 6.5.

So now this HR guys getting all excited because he's selling to the team that he can build a blushing detector using regression models and predict if this new employees blushing about salary.

So at the beginning the team finds a little weird but it's kind of curious to see what's going to happen. And therefore here is the mission:

This new employee is telling that his annual salary was a 160K. Let's predict if it's truth or bluff by building a blushing detector using polynomial regression.

In [14]:

Out[14]:

array([[ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5],
       [ 6],
       [ 7],
       [ 8],
       [ 9],
       [10]])

In [15]:

Out[15]:

array([  45000,   50000,   60000,   80000,  110000,  150000,  200000,
        300000,  500000, 1000000])

In [68]:

plt.scatter(X, y)
plt.title('Salary vs Level')
plt.xlabel('Level')
plt.ylabel('Salary')

Out[68]:

Text(0,0.5,'Salary')

We can see the non-linear relationship between the Salary and Level

Fitting Linear Regression to the dataset

In [72]:

lin_reg = LinearRegression()
lin_reg.fit(X, y)

Out[72]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Fitting Polynomial Regression to the dataset

In [74]:

poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X) # New matrix of feature

In [75]:

X_poly

Out[75]:

array([[   1.,    1.,    1.],
       [   1.,    2.,    4.],
       [   1.,    3.,    9.],
       [   1.,    4.,   16.],
       [   1.,    5.,   25.],
       [   1.,    6.,   36.],
       [   1.,    7.,   49.],
       [   1.,    8.,   64.],
       [   1.,    9.,   81.],
       [   1.,   10.,  100.]])

In [76]:

# Include fit with poly_reg
lin_reg_2 = LinearRegression()
lin_reg_2.fit(X_poly, y)

Out[76]:

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)

Visualising the Linear Regression results

In [77]:

plt.scatter(X, y, c = 'red')
plt.plot(X, lin_reg.predict(X), c = 'green')
plt.title('Truth or Bluff (Linear Regression)')
plt.xlabel('Level')
plt.ylabel('Salary')

Out[77]:

Text(0,0.5,'Salary')

Visualising the Polynomial Regression results

In [78]:

plt.scatter(X, y, c = 'red')
plt.plot(X, lin_reg_2.predict(poly_reg.fit_transform(X)), c = 'green')
plt.title('Truth or Bluff (Polynomial Regression)')
plt.xlabel('Level')
plt.ylabel('Salary')

Out[78]:

Text(0,0.5,'Salary')